Compass: A hybrid method for clinical and biobank data mining
نویسندگان
چکیده
We describe a new method for identification of confident associations within large clinical data sets. The method is a hybrid of two existing methods; Self-Organizing Maps and Association Mining. We utilize Self-Organizing Maps as the initial step to reduce the search space, and then apply Association Mining in order to find association rules. We demonstrate that this procedure has a number of advantages compared to traditional Association Mining; it allows for handling numerical variables without a priori binning and is able to generate variable groups which act as "hotspots" for statistically significant associations. We showcase the method on infertility-related data from Danish military conscripts. The clinical data we analyzed contained both categorical type questionnaire data and continuous variables generated from biological measurements, including missing values. From this data set, we successfully generated a number of interesting association rules, which relate an observation with a specific consequence and the p-value for that finding. Additionally, we demonstrate that the method can be used on non-clinical data containing chemical-disease associations in order to find associations between different phenotypes, such as prostate cancer and breast cancer.
منابع مشابه
Estimation of geochemical elements using a hybrid neural network-Gustafson-Kessel algorithm
Bearing in mind that lack of data is a common problem in the study of porphyry copper mining exploration, our goal was set to identify the hidden patterns within the data and to extend the information to the data-less areas. To do this, the combination of pattern recognition techniques has been used. In this work, multi-layer neural network was used to estimate the concentration of geochemical ...
متن کاملThe Significance of Biobanking in the Sustainability of Biomedical Research: A Review
Biobank, defined as a functional unit for facilitating and improving research by storing biospecimen and their accompanying data, is a key resource for advancement in life science. The history of biobanking goes back to the time of archiving pathology samples. Nowadays, biobanks have considerably improved and are classified into two categories: diseased-oriented and population-based biobanks. U...
متن کاملGraph Hybrid Summarization
One solution to process and analysis of massive graphs is summarization. Generating a high quality summary is the main challenge of graph summarization. In the aims of generating a summary with a better quality for a given attributed graph, both structural and attribute similarities must be considered. There are two measures named density and entropy to evaluate the quality of structural and at...
متن کاملFeature extraction of hyperspectral images using boundary semi-labeled samples and hybrid criterion
Feature extraction is a very important preprocessing step for classification of hyperspectral images. The linear discriminant analysis (LDA) method fails to work in small sample size situations. Moreover, LDA has poor efficiency for non-Gaussian data. LDA is optimized by a global criterion. Thus, it is not sufficiently flexible to cope with the multi-modal distributed data. We propose a new fea...
متن کاملA Hybrid DEA Based CHAID and Imperialist Competitive Algorithm for Stock Selection
In this paper, the investment portfolio is formed based on the data mining algorithm of CHAID on the basis of the risk status criteria. In the next step, the second investment portfolio is created based on the decision rules extracted by the DEA-BCC model. The final portfolio is created through a two-objective mathematical programming model based on the Imperialist Competitive algorithm.
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Journal of biomedical informatics
دوره 47 شماره
صفحات -
تاریخ انتشار 2014